First of all, import the datasets i obtained from ORFEO
bcast_default.df<- read.csv(file = "bcast_default.csv")
bcast_linear.df<- read.csv(file = "bcast_linear.csv")
bcast_chain.df<- read.csv(file = "bcast_chain.csv")
bcast_binarytree.df<- read.csv(file = "bcast_binarytree.csv")
Now let’s bind them into a single df:
bcast.df<- rbind(bcast_default.df,bcast_linear.df,bcast_chain.df,bcast_binarytree.df)
At the moment let’s fix the size of the message to 1
MPI_CHAR
As we did above for linear broadcast, we fix the size of the message
to 2 MPI_CHAR:
As for now, let’s focus on the comparison between Linear vs Chain
Difference is not particularly evident. BUT, if we now increase the size of the message, the different performances of the 2 algorithms will be highlighted:
Since allocation by core is associated to the lowest latency, let’s
fix Allocation to core and compare the algorithm behind the MPI
broadcast operation. Considering a message size = 1
MPI_CHAR we obtain:
Instead, if we let the size vary we get something similar as seen in Linear vs Chain
Now we fix allocation to core, as it was the more performant allocation for all the algorithms. Our aim is to fit a performance model able to predict latency basing on data
First of all, let’s plot the variables to spot some relations:
We can implement a simple linear model (it makes sense to remove the intercept)
fit.linear<- lm(formula= Latency ~ -1+MessageSize+Processes,
data=bcast_linear.df %>% filter(Allocation == "core")
)
summary(fit.linear)
##
## Call:
## lm(formula = Latency ~ -1 + MessageSize + Processes, data = bcast_linear.df %>%
## filter(Allocation == "core"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1133.70 -32.02 -20.19 -9.89 650.40
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## MessageSize 1.136e-03 2.454e-05 46.294 < 2e-16 ***
## Processes 1.083e+00 2.269e-01 4.775 2.36e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 137.4 on 502 degrees of freedom
## Multiple R-squared: 0.838, Adjusted R-squared: 0.8373
## F-statistic: 1298 on 2 and 502 DF, p-value: < 2.2e-16
Let’s give a marginal look at our response variable:
EVIDENT SKEWNESS! Let’s try to apply log2 transformation (in the model I will apply it both to Latency and MessageSize):
The linear model becomes:
fit.linear.log<- lm(formula= log2(Latency) ~ -1+Processes+log2(MessageSize),
data=bcast_linear.df %>% filter(Allocation == "core")
)
summary(fit.linear.log)
##
## Call:
## lm(formula = log2(Latency) ~ -1 + Processes + log2(MessageSize),
## data = bcast_linear.df %>% filter(Allocation == "core"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.4898 -1.1118 -0.5450 0.3623 3.6810
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Processes 0.029815 0.003808 7.83 2.92e-14 ***
## log2(MessageSize) 0.321259 0.009308 34.51 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.62 on 502 degrees of freedom
## Multiple R-squared: 0.8824, Adjusted R-squared: 0.8819
## F-statistic: 1883 on 2 and 502 DF, p-value: < 2.2e-16
Adding a quadratic term for log2(MessageSize):
fit.linear.log.quadratic<- lm(formula= log2(Latency) ~ -1+Processes+log2(MessageSize)+I(log2(MessageSize) ^2),
data=bcast_linear.df %>% filter(Allocation == "core")
)
summary(fit.linear.log.quadratic)
##
## Call:
## lm(formula = log2(Latency) ~ -1 + Processes + log2(MessageSize) +
## I(log2(MessageSize)^2), data = bcast_linear.df %>% filter(Allocation ==
## "core"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.68339 -0.46033 -0.00415 0.43852 1.22333
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Processes 0.0718655 0.0019746 36.40 <2e-16 ***
## log2(MessageSize) -0.3022613 0.0147847 -20.44 <2e-16 ***
## I(log2(MessageSize)^2) 0.0355722 0.0008083 44.01 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7354 on 501 degrees of freedom
## Multiple R-squared: 0.9758, Adjusted R-squared: 0.9757
## F-statistic: 6741 on 3 and 501 DF, p-value: < 2.2e-16
Comment: I managed to increase the \(R^2_{adj}\) from 0.8373 to 0.9757 :)
fit.chain.log<- lm(formula= log2(Latency) ~ -1+Processes+log2(MessageSize),
data=bcast_chain.df %>% filter(Allocation == "core")
)
summary(fit.chain.log)
##
## Call:
## lm(formula = log2(Latency) ~ -1 + Processes + log2(MessageSize),
## data = bcast_chain.df %>% filter(Allocation == "core"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.3286 -1.3089 -0.5489 0.6052 3.1535
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Processes 0.021957 0.003739 5.873 7.8e-09 ***
## log2(MessageSize) 0.306713 0.009139 33.560 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.591 on 502 degrees of freedom
## Multiple R-squared: 0.8683, Adjusted R-squared: 0.8677
## F-statistic: 1654 on 2 and 502 DF, p-value: < 2.2e-16
fit.chain.log.quadratic<- lm(formula= log2(Latency) ~ -1+Processes+log2(MessageSize)+I(log2(MessageSize)^2),
data=bcast_chain.df %>% filter(Allocation == "core")
)
summary(fit.chain.log.quadratic)
##
## Call:
## lm(formula = log2(Latency) ~ -1 + Processes + log2(MessageSize) +
## I(log2(MessageSize)^2), data = bcast_chain.df %>% filter(Allocation ==
## "core"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.44302 -0.33516 0.02293 0.36399 1.19481
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Processes 0.0646477 0.0016598 38.95 <2e-16 ***
## log2(MessageSize) -0.3262914 0.0124275 -26.26 <2e-16 ***
## I(log2(MessageSize)^2) 0.0361133 0.0006794 53.15 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6181 on 501 degrees of freedom
## Multiple R-squared: 0.9802, Adjusted R-squared: 0.98
## F-statistic: 8248 on 3 and 501 DF, p-value: < 2.2e-16
fit.binarytree.log<- lm(formula= log2(Latency) ~ -1+Processes+log2(MessageSize),
data=bcast_binarytree.df %>% filter(Allocation == "core")
)
summary(fit.binarytree.log)
##
## Call:
## lm(formula = log2(Latency) ~ -1 + Processes + log2(MessageSize),
## data = bcast_binarytree.df %>% filter(Allocation == "core"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.4247 -1.3128 -0.6351 0.5242 3.6288
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Processes 0.016315 0.003766 4.332 1.79e-05 ***
## log2(MessageSize) 0.317449 0.009207 34.480 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.603 on 502 degrees of freedom
## Multiple R-squared: 0.8663, Adjusted R-squared: 0.8658
## F-statistic: 1626 on 2 and 502 DF, p-value: < 2.2e-16
fit.binarytree.log.quadratic<- lm(formula= log2(Latency) ~ -1+Processes+log2(MessageSize)+I(log2(MessageSize)^2),
data=bcast_binarytree.df %>% filter(Allocation == "core")
)
summary(fit.binarytree.log.quadratic)
##
## Call:
## lm(formula = log2(Latency) ~ -1 + Processes + log2(MessageSize) +
## I(log2(MessageSize)^2), data = bcast_binarytree.df %>% filter(Allocation ==
## "core"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.44209 -0.28928 0.00979 0.27741 1.15603
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Processes 0.0600791 0.0014947 40.19 <2e-16 ***
## log2(MessageSize) -0.3314744 0.0111918 -29.62 <2e-16 ***
## I(log2(MessageSize)^2) 0.0370215 0.0006119 60.50 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5567 on 501 degrees of freedom
## Multiple R-squared: 0.9839, Adjusted R-squared: 0.9838
## F-statistic: 1.021e+04 on 3 and 501 DF, p-value: < 2.2e-16
fit.default.log<- lm(formula= log2(Latency) ~ -1+Processes+log2(MessageSize),
data=bcast_default.df %>% filter(Allocation == "core")
)
summary(fit.default.log)
##
## Call:
## lm(formula = log2(Latency) ~ -1 + Processes + log2(MessageSize),
## data = bcast_default.df %>% filter(Allocation == "core"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.4849 -1.2933 -0.5753 0.5509 3.2011
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Processes 0.009586 0.003781 2.535 0.0115 *
## log2(MessageSize) 0.321418 0.009242 34.778 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.609 on 502 degrees of freedom
## Multiple R-squared: 0.8592, Adjusted R-squared: 0.8586
## F-statistic: 1532 on 2 and 502 DF, p-value: < 2.2e-16
fit.default.log.quadratic<- lm(formula= log2(Latency) ~ -1+Processes+log2(MessageSize)+I(log2(MessageSize)^2),
data=bcast_default.df %>% filter(Allocation == "core")
)
summary(fit.default.log.quadratic)
##
## Call:
## lm(formula = log2(Latency) ~ -1 + Processes + log2(MessageSize) +
## I(log2(MessageSize)^2), data = bcast_default.df %>% filter(Allocation ==
## "core"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.42018 -0.42153 -0.03617 0.38545 1.35757
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Processes 0.0525293 0.0017274 30.41 <2e-16 ***
## log2(MessageSize) -0.3153369 0.0129339 -24.38 <2e-16 ***
## I(log2(MessageSize)^2) 0.0363273 0.0007071 51.37 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6433 on 501 degrees of freedom
## Multiple R-squared: 0.9775, Adjusted R-squared: 0.9774
## F-statistic: 7267 on 3 and 501 DF, p-value: < 2.2e-16
I just propose an example for the linear algorithm